Detecting very large sets of referenced files at 40/100 GbE, especially MP4 files

نویسندگان

  • Adrien Larbanet
  • Jonas Lerebours
  • J. P. David
چکیده

Internet traffic monitoring is an increasingly challenging task because of the high bandwidths, especially at Internet Service Provider routers and/or Internet backbones. We propose a parallel implementation of the max-hashing algorithm that enables the detection of millions of referenced files by deep packet inspection over high bandwidth connections. We also propose a method to extract high-entropy signatures from MP4 files compatible with the max-hashing algorithm in order to have low false positive rates. The system first computes a set of fingerprints, which are small subsets of the referenced files a priori unique and easily identifiable. At detection time, the max-hashing algorithm eliminates the need to reconstruct the flows. A Graphics Processing Unit (GPU) card computes the fingerprints of all the IP packets in parallel and searches for hits in the onboard collection of fingerprints. Our application, dedicated to the detection of known MP4 video files, enables the detection of millions of fingerprints and demonstrates a sustained processing rate of 50 Gbps per card. Furthermore, a null false positive rate was observed for our 28.25 GB transfer test. The proposed implementation also features the detection of suspect flows based on IP addresses and ports in order to carry out deeper investigations off line. © 2015 The Authors. Published by Elsevier Ltd on behalf of DFRWS. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Study of Detecting Computer Viruses in Real-Infected Files in the n-Gram Representation with Machine Learning Methods

Machine learning methods were successfully applied in recent years for detecting new and unseen computer viruses. The viruses were, however, detected in small virus loader files and not in real infected executable files. We created data sets of benign files, virus loader files and real infected executable files and represented the data as collections of n-grams. Histograms of the relative frequ...

متن کامل

Steganalysis of OpenPuff through atomic concatenation of MP4 flags

OpenPuff is recognised as one of the leading tools in video steganography for its capability to securely hide information. This is in contrast to a number of video steganography tools that apply outdated and highly insecure techniques such as EOF data injection. However, even OpenPuff has subtle vulnerabilities that can be easily exploited. In this talk we propose a method to detect the presenc...

متن کامل

Cyclic fatigue behavior of nickel-titanium dental rotary files in clinical simulated root canals.

BACKGROUND/PURPOSE Dental rotary instruments can be applied in multiple conditions of canals, but unpredictable fatigue fracture may happen. This study evaluated the fatigue lives of two batches of nickel-titanium (NiTi) dental rotary files operating in clinically simulated root canals. METHODS Single-step cyclic fatigue tests were carried out to assess the performance of two batches of NiTi ...

متن کامل

Unknown Malicious Code Detection – Practical Issues

The recent growth in Internet usage has motivated the creation of new malicious code for various purposes, including information warfare. Today’s signature-based anti-viruses can detect accurately known malicious code but are very limited in detecting new malicious code. New malicious codes are being created every day, and their number is expected to increase in the coming years. Recently, mach...

متن کامل

Using Visualization to Detect Plagiarism in Computer Science Classes

This paper introduces a number of general methods for visualizing commonality in sets of text files. Each visualization simultaneously compares one file in the set to all other files in the set. These visualizations, which can be computed in O(n) time and space, are explained and then applied to the problem of detecting plagiarism in large computer science classes. A case study is presented and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Digital Investigation

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2015